12 research outputs found
Incremental Image Labeling via Iterative Refinement
Data quality is critical for multimedia tasks, while various types of
systematic flaws are found in image benchmark datasets, as discussed in recent
work. In particular, the existence of the semantic gap problem leads to a
many-to-many mapping between the information extracted from an image and its
linguistic description. This unavoidable bias further leads to poor performance
on current computer vision tasks. To address this issue, we introduce a
Knowledge Representation (KR)-based methodology to provide guidelines driving
the labeling process, thereby indirectly introducing intended semantics in ML
models. Specifically, an iterative refinement-based annotation method is
proposed to optimize data labeling by organizing objects in a classification
hierarchy according to their visual properties, ensuring that they are aligned
with their linguistic descriptions. Preliminary results verify the
effectiveness of the proposed method
RCRN: Real-world Character Image Restoration Network via Skeleton Extraction
Constructing high-quality character image datasets is challenging because
real-world images are often affected by image degradation. There are
limitations when applying current image restoration methods to such real-world
character images, since (i) the categories of noise in character images are
different from those in general images; (ii) real-world character images
usually contain more complex image degradation, e.g., mixed noise at different
noise levels. To address these problems, we propose a real-world character
restoration network (RCRN) to effectively restore degraded character images,
where character skeleton information and scale-ensemble feature extraction are
utilized to obtain better restoration performance. The proposed method consists
of a skeleton extractor (SENet) and a character image restorer (CiRNet). SENet
aims to preserve the structural consistency of the character and normalize
complex noise. Then, CiRNet reconstructs clean images from degraded character
images and their skeletons. Due to the lack of benchmarks for real-world
character image restoration, we constructed a dataset containing 1,606
character images with real-world degradation to evaluate the validity of the
proposed method. The experimental results demonstrate that RCRN outperforms
state-of-the-art methods quantitatively and qualitatively.Comment: Accepted to ACM MM 202
CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising
Degraded images commonly exist in the general sources of character images,
leading to unsatisfactory character recognition results. Existing methods have
dedicated efforts to restoring degraded character images. However, the
denoising results obtained by these methods do not appear to improve character
recognition performance. This is mainly because current methods only focus on
pixel-level information and ignore critical features of a character, such as
its glyph, resulting in character-glyph damage during the denoising process. In
this paper, we introduce a novel generic framework based on glyph fusion and
attention mechanisms, i.e., CharFormer, for precisely recovering character
images without changing their inherent glyphs. Unlike existing frameworks,
CharFormer introduces a parallel target task for capturing additional
information and injecting it into the image denoising backbone, which will
maintain the consistency of character glyphs during character image denoising.
Moreover, we utilize attention-based networks for global-local feature
interaction, which will help to deal with blind denoising and enhance denoising
performance. We compare CharFormer with state-of-the-art methods on multiple
datasets. The experimental results show the superiority of CharFormer
quantitatively and qualitatively.Comment: Accepted by ACM MM 202
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Multi-Term Attention Networks for Skeleton-Based Action Recognition
The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called “Multi-Term Attention Networks” (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs
Optimization of positioning technology of aspheric compound eyes with variable focal length
For single non-uniform surface compound eyes cannot achieve zoom imaging, resulting in poor imaging and other issues, a new type of aspherical artificial compound eye structure with variable focal length is proposed in this paper. The structure divides the surface compound eye into three fan-shaped areas, and different focal lengths of the micro-lens in different area make the artificial compound eye zoom in a certain range. The focal length and size of the micro-lens are determined by the area and the location of the micro-lens. The optimization of aspherical array of the micro-lens is calculated and the spherical aberration in each area is reduced to one percent of the initial value. Through simulation analysis, the designed artificial compound eye structure can realize the focal length adjustment, and effectively reduce the problem of the poor imaging quality of the curved compound eye edge. As a result, the aspherical artificial compound eye sample with the number of eyes of n=61 and the diameter of the base of 8.66mm was prepared using the molding method. The mutual relationship between the eyes of the child was calibrated and a mathematical model for the simultaneous identification of multiple sub eyes was established. An artificial compound eye positioning experimental system with the error value less than 10% was set up through a number of micro-lens capture target point settlement coordinates
All-Fiber Airborne Coherent Doppler Lidar to Measure Wind Profiles
An all-fiber airborne pulsed coherent Doppler lidar (CDL) prototype at 1.54μm is developed to measure wind profiles in the lower troposphere layer. The all-fiber single frequency pulsed laser is operated with pulse energy of 300μJ, pulse width of 400ns and pulse repetition rate of 10kHz. To the best of our knowledge, it is the highest pulse energy of all-fiber eye-safe single frequency laser that is used in airborne coherent wind lidar. The telescope optical diameter of monostatic lidar is 100 mm. Velocity-Azimuth-Display (VAD) scanning is implemented with 20 degrees elevation angle in 8 different azimuths. Real-time signal processing board is developed to acquire and process the heterodyne mixing signal with 10000 pulses spectra accumulated every second. Wind profiles are obtained every 20 seconds. Several experiments are implemented to evaluate the performance of the lidar. We have carried out airborne wind lidar experiments successfully, and the wind profiles are compared with aerological theodolite and ground based wind lidar. Wind speed standard error of less than 0.4m/s is shown between airborne wind lidar and balloon aerological theodolite